A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional

نویسنده

  • Sotirios Chatzis
چکیده

Gath–Geva (GG) algorithm is one of the most popular methodologies for fuzzy c-means (FCM)-type clustering of data comprising numeric attributes; it is based on the assumption of data deriving from clusters of Gaussian form, a much more flexible construction compared to the spherical clusters assumption of the original FCM. In this paper, we introduce an extension of the GG algorithm to allow for the effective handling of data with mixed numeric and categorical attributes. Traditionally, fuzzy clustering of such data is conducted by means of the fuzzy k-prototypes algorithm, which merely consists in the execution of the original FCM algorithm using a different dissimilarity functional, suitable for attributes with mixed numeric and categorical attributes. On the contrary, in this work we provide a novel FCM-type algorithm employing a fully probabilistic dissimilarity functional for handling data with mixed-type attributes. Our approach utilizes a fuzzy objective function regularized by Kullback–Leibler (KL) divergence information, and is formulated on the basis of a set of probabilistic assumptions regarding the form of the derived clusters. We evaluate the efficacy of the proposed approach using benchmark data, and we compare it with competing fuzzy and non-fuzzy clustering algorithms. 2011 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering Algorithm for Incomplete Data Sets with Mixed Numeric and Categorical Attributes

The traditional k-prototypes algorithm is well versed in clustering data with mixed numeric and categorical attributes, while it is limited to complete data. In order to handle incomplete data set with missing values, an improved k-prototypes algorithm is proposed in this paper, which employs a new dissimilarity measure for incomplete data set with mixed numeric and categorical attributes and a...

متن کامل

خوشه‌بندی خودکار داده‌های مختلط با استفاده از الگوریتم ژنتیک

In the real world clustering problems, it is often encountered to perform cluster analysis on data sets with mixed numeric and categorical values. However, most existing clustering algorithms are only efficient for the numeric data rather than the mixed data set. In addition, traditional methods, for example, the K-means algorithm, usually ask the user to provide the number of clusters. In this...

متن کامل

An improved k-prototypes clustering algorithm for mixed numeric and categorical data

Data objects with mixed numeric and categorical attributes are commonly encountered in real world. The k-prototypes algorithm is one of the principal algorithms for clustering this type of data objects. In this paper, we propose an improved k-prototypes algorithm to cluster mixed data. In our method, we first introduce the concept of the distribution centroid for representing the prototype of c...

متن کامل

Coupled Shortest Fuzzy C-Means Clustering Algorithm (CS-FCM) In Mixed Dataset

Nowadays Clustering in mixed dataset is a dynamic research topic in data mining concepts. Most of the clustering process is based on numerical attributes. That processes are not suitable for mixed dataset. The nature of mixed dataset is the combination of numeric and categorical data type. Hence, the proposed technique required more efficiency to handle the mixed data set. This paper proposes a...

متن کامل

An Improved Semi-Supervised Clustering Algorithm Based on Active Learning

In semi supervised clustering is one of the major tasks and aims at grouping the data objects into meaningful classes (clusters) such that the similarity of objects within clusters is maximized and the similarity of objects between clusters is minimized. The dataset sometimes may be in mixed nature that is it may consist of both numeric and categorical type of data. Naturally these two types of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Expert Syst. Appl.

دوره 38  شماره 

صفحات  -

تاریخ انتشار 2011